Quality of Systematic Reviews of Observational Nontherapeutic Studies

Introduction High-quality epidemiologic research is essential in reducing chronic diseases. We analyzed the quality of systematic reviews of observational nontherapeutic studies. Methods We searched several databases for systematic reviews of observational nontherapeutic studies that examined the prevalence of or risk factors for chronic diseases and were published in core clinical journals from 1966 through June 2008. We analyzed the quality of such reviews by using prespecified criteria and internal quality evaluation of the included studies. Results Of the 145 systematic reviews we found, fewer than half met each quality criterion; 49% reported study flow, 27% assessed gray literature, 2% abstracted sponsorship of individual studies, and none abstracted the disclosure of conflict of interest by the authors of individual studies. Planned, formal internal quality evaluation of included studies was reported in 37% of systematic reviews. The journal of publication, topic of review, sponsorship, and conflict of interest were not associated with better quality. Odds of formal internal quality evaluation (odds ratio [OR], 1.10 per year; 95% confidence interval [CI], 1.02-1.19) and either planned, formal internal quality evaluation or abstraction of quality criteria of included studies (OR, 1.17 per year; 95% CI, 1.08-1.26) increased over time, without positive trends in other quality criteria from 1990 through June 2008. Systematic reviews with internal quality evaluation did not meet other quality criteria more often than those that ignored the quality of included studies. Conclusion Collaborative efforts from investigators and journal editors are needed to improve the quality of systematic reviews.


Introduction
Valid epidemiologic research is essential in preventing chronic diseases (1)(2)(3). Assessing the quality of observational studies is an important part of evidence synthesis (4). Systematic reviews have become key tools in evidence synthesis from a growing number of epidemiologic studies (5). Producing high-quality systematic reviews is essential to developing generalizable and actionable conclusions (6,7). Quality criteria for systematic reviews have been proposed by working groups that developed the Metaanalysis of Observational Studies in Epidemiology (MOOSE), Strengthening the Reporting of Observational Studies in Epidemiology (STROBE), and a measurement tool for assessment of multiple systematic reviews (AMSTAR) (8)(9)(10)(11)(12). The working groups and the Cochrane handbook (13) addressed those criteria for systematic reviews that more likely result in biased results, including bias in selection of the studies or the information within studies by the reviewers (14)(15)(16)(17)(18) or bias in the publication of positive significant results (6,15,19,20).
Previous research and guidelines (13,(21)(22)(23) focus on systematic reviews of interventional therapeutic studies. Validity of observational nontherapeutic studies of prevalence of chronic diseases or risk factors for diseases is essential for effective preventive public health actions (24,25). Our aim was to evaluate the quality of systematic reviews of observational nontherapeutic studies that examined the incidence and prevalence of chronic conditions and risk factors for diseases. The criteria we used to determine the reporting and methodologic quality in systematic reviews were from published standards (8)(9)(10)(11)(12). We hypothesized that the quality of systematic reviews differs by the time when the study was published, the country in which the study was conducted, the journal of publication, the sponsorship of the study, and whether a conflict of interest was disclosed. We hypothesized also that systematic reviews with internal quality evaluation of the included studies would have better quality, demonstrating commitment to quality of evidence.

Data sources
We searched MEDLINE via PubMed and via Ovid MEDLINE, the Cochrane Library (26) and working groups, WorldCat (27), and Scirus (28) to find systematic reviews of observational nontherapeutic studies published in English from 1966 through June 2008 in core clinical journals (exact search string is listed in Appendix Table 1). We used the definitions of core clinical journals from the Abridged Index Medicus (119 indexed titles). We defined observational nontherapeutic studies as observations of patient outcomes that did not examine procedures concerned with the remedial treatment or prevention of diseases (29).

Study selection
Three investigators independently decided on the eligibility of the studies according to recommendations from the Cochrane Handbook for Systematic Reviews of Interventions (13). We reviewed abstracts to exclude comments, expert opinions, letters, case reports, systematic reviews of interventional studies, and systematic reviews of studies of diagnostic accuracy of tests.

Data extraction
Evaluations of the studies and data extraction were performed independently by 2 researchers. Predefined categorical responses to the checklist items were abstracted into our spreadsheet. Errors in data extraction were assessed by a comparison of the data charts with the original articles (13,30). Any discrepancies were discussed and resolved. The quality criteria that we abstracted were based on guidelines for determining the reporting and methodologic quality of systematic reviews (8)(9)(10)(11)(12).
To evaluate selection bias, we abstracted whether the authors of systematic reviews described the search strategy (yes, no, or partially); yes indicated that the authors reported time periods of searches, searched databases, and exact search string. We abstracted whether the authors of systematic reviews described study flow (yes, no, or partially); yes indicated that the authors reported the list of retrieved citations, the list of excluded studies, and justification for exclusion.
We abstracted as dichotomous variables whether the authors of systematic reviews did any of the following: • Stated the aim of the review and the primary and secondary hypotheses of the review. We abstracted how the authors of systematic reviews described obtained statistical methods with justification and models for pooling with fixed or random effects models in sufficient detail to be replicated (no pooling, random, or fixed). We abstracted whether the authors of pooling analyses reported statistical tests for heterogeneity and whether heterogeneity was statistically significant (not reported, not significant, or significant).
We used 3 categories to classify whether the authors of systematic reviews had evaluated the quality of included studies by using developed or previously published checklists or scales (31): 1) the authors stated planned, formal internal quality evaluations; 2) the authors abstracted selected criteria of external or internal validity without using a planned, formal, and comprehensive internal quality evaluation; and 3) the authors did not conduct The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.
internal quality evaluations. We further categorized the studies that evaluated quality criteria to compare studies with no mention of internal quality evaluation of the included studies. We also compared studies with and without planned formal internal quality evaluation. We abstracted with dichotomous responses blinding and reliability testing (reported or not reported) of internal quality evaluations.
We abstracted several explanatory variables that could be related to the quality of systematic reviews: • The year of publication, defined as a continuous variable. We created categories of 4-or 5-year periods: 1990 to 1994, 1995 to 1999, 2000 to 2004, and 2005 through June 2008. • The journals of publication.
• The country where the systematic reviews were performed. • The sponsorship of the reviews. Those that had either governmental or foundational support or were fellowships were defined as having nonprofit support. • The disclosure of conflict of interest by authors of reviews (either not disclosed, disclosed as no conflict of interest, or disclosed conflict of interest). • The number of disclosed relationships with industry, defined as a continuous variable. • The sponsor's participation in data collection, analysis, and interpretation of the results of the review. • The review outcomes as risk factors for prevalence or incidence of chronic conditions or diseases.

Data synthesis
We summarized the results in evidence tables. We used prespecified categories of dependent and independent variables and did not force the data into binary categories for definitive tests of significance. We used univariate logistic regression to examine the association between internal quality evaluation and the year of the publication by using the Wald test. Odds ratios (ORs) were calculated with binary logit models and Fisher's scoring method technique. We computed the fractions of systematic reviews meeting various quality criteria in each of the 4 time periods considered. The proportions of systematic reviews that met different levels of each quality criterion were evaluated by using χ 2 tests and Fisher's exact tests in cases of small numbers. All calculations were performed at 95% confidence intervals (CIs) by using 2-sided P values with SAS version 9.1.3 (SAS Institute, Inc, Cary, North Carolina).

Results
We found 145 eligible systematic reviews of observational nontherapeutic studies (study flow in the Appendix Figure) (32-176). The number of published systematic reviews increased from 17 during 1990-1994 to 56 during 2005-2008. Most of the studies were conducted in the United States (55 publications) or in the United Kingdom (28 publications) (Appendix Table 2). Half of the systematic reviews (73 publications) were funded by nonprofit organizations; 56 (39%) reviews did not publish their funding sources, 4 reviews received industry support, and 10 were sponsored jointly by industry and nonprofit organizations. Almost three-fourths (106) of the authors of systematic reviews did not disclose conflict of interest; 35 publications stated that the authors do not have any conflict of interest; and 4 studies were conducted by authors who reported conflict of interest. The studies were published in 49 journals. Most systematic reviews (122 studies) assessed risk factors for chronic diseases, 19 summarized estimates of prevalence or incidence, 2 studies reported prevalence and associations with risk factors, and 2 studies examined levels of risk factors. Most studies reported incidence and risk factors for cardiovascular diseases (46 studies) or cancer (26 studies).

Quality of systematic reviews
Less than half of the studies reported study flow (49%), assessed gray literature (27%), or addressed language bias (29%) ( Table 1). Only 2% of reviews abstracted sponsorship of individual studies and none abstracted the disclosure of conflict of interest by the authors of individual studies that were eligible for the reviews. Pooling was performed in 137 studies; of these, 62% used a random effects model; 57% reported detecting significant heterogeneity across the studies; and 19% did not provide any information about statistical heterogeneity in pooled estimates. The proportion of systematic reviews that met quality criteria including study flow, assessment of gray literature, or the abstraction of funding sources of included studies did not show significant trends from 1990 through 2008. The proportion of systematic reviews that assessed language bias increased from 8% during [1995][1996][1997][1998][1999]

Internal quality evaluation
Planned and detailed quality assessment of included studies was reported in 37% of systematic reviews, and 18% abstracted more than 1 criterion of external or internal quality; significant positive trends were reported during the evaluated time (Table 1). Quality assessment was masked in 3 studies. Development of the appraisals, including references to previously published tools, was reported in 32 studies, but only 6 tested interobserver agreement for quality assessment.

Quality of systematic review by explanatory factors
The quality of systematic reviews did not differ much by study location or by the journal of publication. Systematic reviews of prevalence or incidence or risk factors of the diseases did not differ in their quality measures. Sponsorship was not associated with quality of the reviews. The role of conflict of interest was impossible to establish because the authors of 56 reviews did not disclose funding and authors of 106 reviews did not disclose conflict of interest.

Explanatory factors of internal quality evaluation of included studies
The journal of publication, topic of the review, and continent where the review was conducted were not associated with the likelihood of internal quality evaluation. Systematic reviews of risk factors tended to conduct internal quality evaluation of the included studies more often than reviews of incidence or prevalence or of levels of risk factors. Systematic reviews sponsored by nonprofit organizations conducted internal quality evaluations of individual studies more often than reviews that received corporate funding. Systematic reviews that disclosed conflict of interest conducted internal quality evaluation of individual studies less frequently (10 of 39 studies; 26%) than reviews with no disclosure (44 of 106 studies; 42%). Odds of formal internal quality evaluation (OR, 1.10 per year; 95% CI, 1.02-1.19) and either planned, formal internal quality evaluation or abstraction of quality criteria (OR, 1.17 per year; 95% CI, 1.08-1.26) increased over time. Disclosure of conflict of interest by the authors of systematic reviews was not associated with greater odds of internal quality evaluation.

Quality of systematic reviews by internal quality evaluation
Complete documentation of the literature search including time period, databases searched, and exact literature search strings was less common among reviews with planned, formal internal quality evaluation (48 studies, 35%) than among reviews without it (90 studies, 65%) ( Table 2). However, reviews that either abstracted selected quality criteria or planned, formal internal quality evaluation reported partial (6 studies) or complete (74 studies) information about the literature search more often than studies that did not evaluate quality of included studies (64 studies). Reviews that did not justify exclusion of non-English studies ignored quality of individual studies more often (72 studies) than reviews with planned, formal internal quality evaluation (31 studies). The same pattern was present for publication bias: the reviews that did not mention gray literature also ignored the quality of individual studies. The reviews reporting attempts to contact the authors of included studies either performed planned, formal internal quality evaluation or abstracted selected quality criteria more often than reviews without such attempts (OR, 2.3; 95% CI, 1.1-4.7). Reviews with complete reporting of study flow performed planned, formal internal quality evaluation or abstracted quality criteria more often (51 studies) than reviews without study flows (20 studies). More than half of systematic reviews without planned, formal internal quality evaluation (44 studies) also did not report study flow.
The association between quality of systematic reviews and sponsor participation in the data collection, analyses, and interpretation was difficult to analyze because this information was either omitted or reported in various ways. Less than 10% of systematic reviews contained a clear statement that the sponsors did not play any role in gathering the studies or analyzing or interpreting the results and did not influence the content of the manuscript. Other reviews omitted mention of the role of the sponsor in approval of the manuscript or provided a general statement that sponsors did not influence the conclusions or the content of the paper. Two reviews included statements of unconditional or unrestricted sponsorship of the meta-analyses.

Discussion
Our analyses showed that less than half of the The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above. systematic reviews of nontherapeutic observational studies that were published in core clinical journals met each quality criterion. Quality of systematic reviews did not improve over time. Planned, formal internal quality evaluations of the included studies was reported in less than half of systematic reviews, but the prevalence of internal quality evaluations has increased during the last decade. Our findings are in concordance with previously published methodologic analyses of systematic reviews that also found inconsistent quality and incomplete internal quality evaluation of individual studies (6). Methodologic analyses of systematic reviews that focused on particular diseases or conditions demonstrated that half of the publications had major flaws in design and reporting. For instance, systematic reviews of therapies for renal diseases failed to assess the methodologic quality of included studies (177). Methodologic analyses of systematic reviews of interventions showed that 69% of those randomly selected in MEDLINE meta-analyses did not analyze quality of trials (22). Most (68%) systematic reviews of diagnostic tests for cancer did not provide formal assessments of study quality (178). We also found that the quality of reviews did not differ among types of studies (incidence or risk factors for diseases), types of diseases, or journal of publication.
Journal commitment to high-quality research, however, was associated with improved reporting quality of the publications. For example, adoption by journals of the Consolidated Standards of Reporting Trials (CONSORT) improved the quality of the publications of interventional studies (179,180). An endorsement of the developed standards for observational studies including MOOSE and STROBE checklists may also improve quality of the publications. We did not analyze how many core clinical journals adopted these standards and how quality of the publications changed depending on this adaptation. Peer review of submitted manuscripts should include quality assessment using validated tools (12).
We could not identify the factors that can explain differences in quality of systematic reviews. The role of sponsorship and conflict of interest could not be estimated because of poor reporting of this information. The quality and reliability of quality evaluation of the included studies is unclear because development of the appraisals was described in a small proportion of systematic reviews (32 of 80 studies), and only 6 of 80 studies tested interobserver agreement for quality assessment. We did not evaluate all reviews of observational studies that were published in epidemiologic journals. However, it is unlikely that the quality of reviews published in other journals would be better than those in core clinical journals. Future research should investigate the factors that can explain differences in the quality of systematic reviews.
Peer reviewed publications of high-quality systematic reviews can provide the best available research evidence for evidence-based public health (24). Evidence-based decisions can improve public health practice in preventing incidence and progression of chronic diseases (25). In our analysis, less than half of the systematic reviews of observational nontherapeutic studies met quality criteria established in the MOOSE, STROBE, and AMSTAR statements. Internal quality evaluation of included studies should be an essential part of evidence synthesis, but only half of the reviews reported such evaluation. Collaborative efforts from investigators and journal editors are needed to improve quality of systematic reviews. Centers for Disease Control and Prevention • www.cdc.gov/pcd/issues/2010/nov/09_0195.htm The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.
research fellow, for her statistical expertise in reliability testing; Susan Duval, PhD, for her help estimating sample size; Marilyn Eells for editing and formatting the report; and Nancy Russell, MLS, and Rebecca Schultz for their assistance gathering data from the experts and formatting the tables, and Christa Prodzinski for quality control of the data.

Author Information
Corresponding Author: Tatyana    Reliability of internal quality evaluation reported 2 1 2 .99 Internal quality evaluation was masked 1 1 0 1 .11 Abbreviation: NA, not applicable. a P values for overall χ 2 test. Documented partially 6 0 6 0 a P value for overall χ 2 test between planned, formal internal quality evaluation or abstraction of some quality criteria versus neither planned, formal internal quality evaluation nor abstraction of some quality criteria. b P value for overall χ 2 test between planned, formal internal quality evaluation versus no planned, formal internal quality evaluation. a P value for overall χ 2 test between planned, formal internal quality evaluation or abstraction of some quality criteria versus neither planned, formal internal quality evaluation nor abstraction of some quality criteria. b P value for overall χ 2 test between planned, formal internal quality evaluation versus no planned, formal internal quality evaluation. Heterogeneity was not significant 17 1 15 20 Heterogeneity was significant at least for one association 50 2 0 52 a P value for overall χ 2 test between planned, formal internal quality evaluation or abstraction of some quality criteria versus neither planned, formal internal quality evaluation nor abstraction of some quality criteria. b P value for overall χ 2 test between planned, formal internal quality evaluation versus no planned, formal internal quality evaluation.  The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.  The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above. The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.

Appendix
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the US Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.